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Abstract 

We showcase in this paper the use of some tools from net- 
work theory to describe the strategy of football teams. Us- 
ing passing data made available by FIFA during the 2010 
World Cup, we construct for each team a weighted and di- 
rected network in which nodes correspond to players and 
arrows to passes. The resulting network or graph provides a 
direct visual inspection of a team's strategy, from which we 
can identify play pattern, determine hot-spots on the play 
and localize potential weaknesses. Using different central- 
ity measures, we can also determine the relative importance 
of each player in the game, the 'popularity' of a player, and 
the effect of removing players from the game. 

1 Introduction 

Graphs or networks arise in the study of a variety of prob- 
lems, ranging from technological and transport issues to 
social phenomena and biological problems (ll[3j. Their 
prevalence is such that a rich mathematical theory has been 
developed around them, notably by Euler, in relation to the 
Konigsberg bridge problem, Erdos and many others. 

In the world of sports, team sports involving passes be- 
tween players provides one with interesting examples of 
networks. Our goal in this paper is to show how the mathe- 
matical theory of networks can be used to analyze statistical 
information of team sports and measure the performance of 
a team and its players. As a proof of concept, we apply our 
ideas to construct a network analysis of some of the teams 
participating in the football 2010 World Cup. 

Arguably the most popular sport in the world, football 
(soccer for our American readers) has traditionally lagged 
behind other sports, such as baseball or basketball, in terms 
of statistical information made available after games. The 
unique nature of football games, with their constant ball 
flow and comparatively low scores compared to other sports, 
makes simple statistics such as assists or number of goals 
insufficient as measures of team and player performance. 
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Fortunately, it seems that the situation is changing. In re- 
cent years, starting with the UEFA 2008 Euro Cup, an un- 
precedented amount of statistical data has been made pub- 
lic after games. The release of significantly larger amounts 
of data opens up the way for building new and more de- 
tailed analyses of football. Some recent attempts in this 
direction can be found in (4}{9). 

Here we focus on finding a quantifiable representation 
of a team's style using network theory. All renowned foot- 
ball teams in history have displayed a recognizable foot- 
print in their game-play, which has always been thought of 
as something observed by football experts rather than de- 
scribed by game statistics. To reveal this footprint, we use 
the passing distribution of a team to construct a weighted 
and oriented network, with nodes corresponding to players 
and weighted arrows to the number of successful passes 
between players. By attaching each node to the tactical 
positioning of the team, we then obtain an immediate pic- 
ture of the team's style, which can profitably be used to 
observe overused and underused areas of the pitch or to de- 
tect potential performance problems between certain play- 
ers. By computing certain network invariants, such as cen- 
trality measures, we can also analyze a team's performance 
as well as the contributions of each of its players. These 
measures yield, as will be seen, a lot of useful information 
despite the relatively small size (11 players) of passing net- 
works. 

2 The network of a football team 

We define the passing network of a team as the network 
containing the team's players as nodes and connecting ar- 
rows between two players weighted by the successful num- 
ber of passes completed between them. Although networks 
are, technically speaking, only topological in nature, we 
use the passing network as a tool for visualizing a team's 
strategy by fixing its nodes in positions roughly correspond- 
ing to the players' formation on the pitch (see Figure [TJ. 

The passing network is by all means an oversimplifica- 
tion of a football game, as players do not remain in static 
positions during games. However, the network, with its ar- 
rows represented in various thickness and hue, does pro- 
vides an immediate insight into a teams' tactics. It can 



Figure 1 : Passing networks for the Netherlands and Spain drawn before the final game, using the passing data and tactical 
formations of the semi-finals. 



be used, for example, to determine areas of the pitch that 
are favored or neglected, whether the team tends to use or 
abuse short distance or long distance passes, and whether 
a player is not intervening enough in a game. The net- 
work can also be used by a team to detect under-performing 
players, fix weak spots, detect potential problems between 
teammates who are not passing the ball as often as their 
position dictates, as well as to detect weaknesses in rivals. 

This basic visual analysis can be made more quantita- 
tive by computing global network invariants, which charac- 
terize a team as a whole, or local invariants, which provide 
insight about individual players. The computation of some 
of these invariants is described in the next section. They all 
rely on the (weighted) adjacency matrix A, having as entry 
Aij the number of passes from player i to player j. 

The weight (number of passes) will be used as a mea- 
sure of the strength of an arrow in the network and also to 
define a notion of distance between players. This distance 
dij is defined precisely as the geodesic distance given by 
the length of the shortest path connecting the nodes i and j, 
where the length of a path is obtained by adding the lengths 
lij of the arrows according to 

[O if i = j 



The length of an arrow between two players is considered 
infinite if they do not pass the ball to each other. It is worth 
noting that our definition of distance does not need to be 
symmetric (i.e., one can have d^ ^ dji), and is not nec- 
essarily correlated with the physical distance between the 



players in the field. For some computations we will also 
use the non-weighted adjacency matrix £ = (eij) 9 where 
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An example of global invariant is the node connectiv- 
ity, defined as the minimum number of nodes one needs to 
remove in a network to make it disconnected. By the very 
nature of football games, this is not a good invariant to con- 
sider because the passing network of a team is usually very 
close to be complete, and thus has a very high degree of 
node connectivity. More useful is the edge connectivity, 
defined as the minimum number of edges one needs to re- 
move to make the network disconnected. This gives us a 
good measure of the game-play robustness, as it represents 
the smallest number of passes that need to be intercepted 
to interrupt a team's 'natural flow' and to isolate a subset 
of its players, either by preventing the ball from reaching 
them or, if the edge connectivity is computed without the 
passes' directions, by completely isolating them from the 
rest of their teammates. 

3 Player performance 

The individual contribution of a player in a team can be in- 
ferred from local network invariants of the passing network 
and, in particular, from centrality measures, which define 
the relevance or popularity of a player according to dif- 
ferent parameters. We define in this section three of these 



measures and discuss their meaning in the context of foot- 
ball. 

3.1 Closeness 

The closeness centrality or closeness score of a player i is 
one of the simplest notion of node centrality, defined as the 
inverse of the average geodesic distance of that node in the 
network (2): 

20 



For simplicity, we are giving equal weight to outgoing and 
incoming passes in this measure, but this can be adjusted 
for by throwing in arbitrary weights into the equation: 



The closeness score provides a direct measurement on how 
easy it is to reach a particular player within a team. A high 
closeness score corresponds to a small average distance, 
indicating a well-connected player within the team. 

3.2 Betweenness 

A very different notion of centrality is betweenness cen- 
trality, which measures the extent to which a node lies on 
paths between other nodes (2). This quantity is defined as 
the percentage of shortest paths that go through player i: 
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where n % - k is the number of geodesic paths from j to k go- 
ing through i and g^ is the total number of geodesic paths. 
The normalization factor 1/90 ensures < Cs(i) < 1. 

Betweenness does not measure how well-connected a 
player is, but rather how the ball-flow between other play- 
ers depends on that particular player i. It thus provides 
a measure of the impact of removing that player from the 
game, either by getting a red card or by being isolated by 
the rival's defense. A betweenness score of means, in 
particular, that a player is not getting involved in the game, 
and so can be removed without much effect. 

From a tactical point of view, a team should seek be- 
tweenness scores that are evenly distributed among players: 
concentrated betweenness scores that are on the high side 
indicate a high dependence on few, too important players, 
whereas well distributed, low betweenness scores are an 
indication of a well-balanced passing strategy. 

Performance indicators based on betweenness central- 
ity have been previously employed in the context of a team's 
activity [4 ]. Further details on the computation of between- 
ness in directed networks with weights can be found in 

frlol. 



3.3 Pagerank 

Pagerank centrality, introduced in (TTJ, is a recursive no- 
tion of 'popularity' or importance which follows the prin- 
ciple that 'a player is popular if he gets passes from other 
popular players'. Mathematically, pagerank centrality is 
defined by 



T OU 



where L° ut = J2 k Aj k is the total number of passes made 
by player j, p is a heuristic parameter representing the prob- 
ability that a player will decide to give the ball away rather 
than keep it and go for a shot himself, and q is a parameter 
awarding a 'free' popularity to each player. Note that the 
pagerank score of a player depends on the scores of all his 
teammates. As a result, all pagerank scores in a team must 
be computed at the same time. 

Pagerank centrality roughly assigns to each player the 
probability that he will have the ball after a reasonable num- 
ber of passes has been made. If additional precision is re- 
quired for this measurement, the probability p can be re- 
placed by player-dependent probabilities pi, which would 
make more sense if certain players are more prone to keep 
the ball than others. In either case, the value of p (or the 
Pi's) does not come from the network alone, as it might 
in general be very different from one team to another, and 
should be determined by heuristics. As a proof of concept, 
in our analysis we will use a uniform value of p = 0.85 and 
q = 1 for all the teams studied. 

4 Clustering and communities 

An interesting aspect of football is how tightly players in- 
teract in a team. The notion of clustering tells us precisely 
that: it is a measure of the degree to which nodes in a net- 
work tend to cluster together. 

The clustering coefficient of a node in a weighted net- 
work was originally defined in |T2| ; in our analysis we use 
a slight modification of that notion (see fT3) for a compar- 
ison of the different definitions) given by 
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where Ui = ^ e ij is me number of passes made by player 
i, also known as the vertex out-degree. 

The clustering coefficient accounts, technically speak- 
ing, for the transitivity of the network by counting the per- 
centage of all possible triangles containing the node i. To 
make this more precise, imagine that player j wants to pass 
the ball to player k, but since this passing line is well de- 
fended, he has to go through player i to reach player k, thus 
making the path j — >> i — >> k. If this is easy for them to do, 
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i.e., if there is a high number of passes following the path 
j — » i — » fc, then this translates in a high clustering score 
for player i, the one acting as middle-man. Conversely, if 
there is a large unbalance between the amount of passes in- 
volved in team, then the clustering coefficient will be lower. 
The average of all these coefficients gives us the global av- 
erage clustering coefficient for the team. 

In addition to studying how clustered or fragmented a 
team is, we can compute the size of its maximal clique, 
where a clique is a sub-network in which all the nodes are 
linked by an arrow. A clique in a team represents a subset 
of players that are all pairwise-connected by direct passes. 
A well connected team will present a very large maximal 
clique, meaning that almost everybody gets to pass the ball 
to everybody else, whereas the size will be smaller for more 
fragmented teams. The analysis of cliques is the basis for 
finding communities within networks. 

Our initial attempt at studying communities within foot- 
ball teams has not provided any useful information, as the 
high degree of connectivity paired with the small number 
of nodes usually results in the existence of a single com- 
munity that includes every player. This said, one might be 
able to use some suitable variation of the notion of commu- 
nity to overcome this inconvenience. We will postpone the 
study of this problem for future work. 

5 Results and analysis 

We present in this section the results of the computation 
of the different measures presented in the previous sections 
for the teams that have participated in the 2010 FIFA World 
Cup. For reasons of space, we only analyze the teams that 
have made it to the knock-out phase, and focus especially 
on the final qualifiers, Spain vs the Netherlands, and the 
third place qualifiers, Germany vs Uruguay. The passing 
networks and analysis of other teams can be found on our 
websit^B 

5.1 Data 

The passing data for the 2010 FIFA World Cup games was 
downloaded from the official FIFA website using a custom 
Python script. The passing networks were then constructed 
and analyzed using Sage 1 14 ] and NetworkX (15). Graph- 
ics were created with Wolfram Research's Mathematica. 

As FIFA only provides the aggregate data over all the 
games, the passing networks were computed by dividing 
the number of passes by the total number of games played 
by each team. This introduces artifacts in some cases. This 
issue can be taken care of by conducting a per-game analy- 
sis, which was unfortunately not possible in our case. 

1 http://www.maths.qmul.ac.uk/~ht/footballgraphs/ 



Team 


P 


k 






C B 


Cq 


Argentina 


227 


4 


5 


27.9 


2.7 


8 


Brazil 


321 


5 


7 


26.2 


2.0 


8 


Chile 


120 





1 


18.9 


5.1 


6 


England 


239 


2 


3 


28.0 


3.6 


7 


Germany 


220 


2 


2 


24.7 


4.6 


6 


Ghana 


184 


3 


4 


15.5 


3.5 


8 


Japan 


180 


1 


5 


28.9 


3.3 


8 


Korea Rep. 


227 


3 


5 


24.4 


2.6 


8 


Mexico 


225 








27.2 


1.9 


7 


Netherlands 


266 


5 


7 


29.7 


1.9 


8 


Paraguay 


103 





2 


20.4 


7.5 


5 


Portugal 


175 


3 


4 


14.6 


4.1 


7 


Slovakia 


166 


3 


6 


18.5 


3.0 


7 


Spain 


417 


3 


5 


30.0 


1.9 


9 


USA 


160 


1 


4 


16.0 


4.6 


7 


Uruguay 


117 


2 


3 


14.3 


4.8 


6 



Table 1: Data for the teams in the round of 16. P: av- 
erage number of passes; k: edge connectivity; k u : undi- 
rected connectivity; c w \ average clustering; C#: average 
betweenness; Cq\ largest clique. The highest two values 
(except for clique) are highlighted. 

5.2 Teams in the last 16 

The centrality and clustering scores of the teams that made 
it to the last 16 stage are shown in Table[T] Note that the be- 
tweenness and clustering scores are expressed as percent- 
age of the theoretical maximum. 

The main point to note about these results is that Spain, 
the tournament winner and the team that arguably played 
the best football, has the highest number of passes, cluster- 
ing and size of clique. It also has a high-end edge connec- 
tivity, while keeping a low betweenness score. All of this 
is a reflection of the 'total football' or 'tiki-taka' style of 
Spain, in which well-connected players constantly pass the 
ball around. This is also confirmed by the passing network 
(see Figure [T} and the individual players' scores, discussed 
in the next subsection. 

Other teams obtaining scores similar to Spain include 
the Netherlands (qualifying second in the tournament) and 
Brazil, followed by Argentina. At the lower end, Paraguay, 
with its low degree connectivity and and high betweenness, 
appears as a disconnected team relying too much on a few 
players. 

5.3 Spain vs the Netherlands 

Tables [2] to [3] show the closeness, betweenness, pagerank 
and clustering scores of the players of Spain and the Nether- 
lands, respectively, in their formations used in the final. 

Although there are some data artifacts due to the aver- 
aging of the data over several games (which again, would 
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Player 




C B (i) 




w 


Casillas 


16.52 


0.00 


3.29 


20.46 


Pique 


17.32 


3.92 


11.46 


30.70 


Puyol 


16.32 


2.86 


7.92 


27.12 


Iniesta 


14.60 


0.50 


8.54 


31.03 


Villa 


8.68 


0.50 


5.89 


23.96 


Xavi 


18.28 


1.19 


14.66 


46.47 


Capdevila 


16.54 


6.12 


10.56 


29.91 


Alonso 


17.11 


1.19 


12.31 


41.69 


Ramos 


16.45 


2.41 


9.02 


27.05 


Busquets 


18.55 


2.41 


12.99 


35.32 


Pedro 


3.42 


0.00 


3.35 


16.75 



Table 2: Player scores for Spain. The two highest scores 
are highlighted. 

be sorted out by performing a per-game analysis), the over- 
all conclusion that we reach from these results is that there 
is a high correlation between high scores in closeness, page- 
rank and clustering, which tend to confirm the general per- 
ception of the players' performance reported in the media 
at the time of the tournament. A remarkable example of this 
correlation is the high scores displayed by Xavi, arguably 
the leading player of the Spanish team. 

On the Spanish side, one should also note that the be- 
tweenness scores are low and uniformly distributed - a sign 
of a well-balanced passing strategy - and consistently high 
clustering scores, showing that Spain is an extremely well- 
connected team, in which almost all players help each other 
by offering themselves as passing options. An exception is 
Pedro, whose low scores are explained by the fact that he is 
a forward and was normally not playing games for their en- 
tire duration. Incidentally, note that forwards can almost al- 
ways be identified as those players having the lowest close- 
ness, betweenness and pagerank, as they are isolated play- 
ers waiting to receive passes, as well as players who get 
replaced more often. 

The scores of the Dutch team are, overall, close to those 
of the Spanish team, particularly the clustering, but there 
are some notable differences. First, there is a clear differ- 
ence in the density of passes, seen in Figure [T] Second, the 
Dutch players are not as close to each other (as measured 
by d) and have pageranks that are more evenly distributed, 
thus showing that none has a predominant role in the pass- 
ing scheme. Finally, Figure [T] shows an unbalanced use of 
the pitch, giving a clear preference to the left side. 

5.4 Germany vs Uruguay 

Tables [4] to [5] shows the same data as in the previous sub- 
section but now for Germany and Uruguay. The formations 
used in this case are those of the semifinals. 

The results for these two teams point to a major differ- 



Player 


Ci 


C B (i) 




w 


Stekelenburg 


16.34 


0.32 


7.63 


28.35 


Van Der Wiel 


14.43 


2.97 


9.79 


31.39 


Heitinga 


16.23 


2.67 


11.06 


31.34 


Mathijsen 


17.30 


1.30 


10.84 


33.22 


V. Bronckhorst 


15.74 


1.12 


10.07 


37.00 


Van Bommel 


12.46 


3.08 


11.19 


32.36 


Kuyt 


7.97 


1.67 


9.02 


27.06 


De Jong 


10.95 


2.73 


9.28 


28.36 


Van Persie 


6.89 


2.92 


5.88 


20.13 


Sneijder 


10.91 


2.17 


10.32 


33.77 


Robben 


5.91 


0.16 


4.91 


23.91 


Table 3: Player scores for the Netherlands. 


Player 




C B (i) 


Xi 


w 


Neuer 


7.58 


0.37 


4.74 


21.54 


Friedrich 


9.29 


3.55 


10.08 


24.99 


Khedira 


8.70 


10.58 


11.38 


26.31 


Schweinsteiger 


10.28 


13.17 


17.32 


27.35 


Ozil 


7.54 


4.34 


10.05 


22.62 


Podolski 


4.91 


0.22 


6.66 


30.21 


Klose 


0.92 


0.00 


2.48 


14.34 


Trochowski 


3.00 


0.00 


2.85 


33.02 


Lahm 


10.60 


11.83 


14.65 


24.56 


Mertesacker 


10.81 


3.42 


13.27 


26.71 


Boateng 


6.85 


3.63 


6.52 


19.85 



Table 4: Player scores for Germany. 



ence in their connectedness: Germany, with its closeness 
scores and high clustering, is overall more connected than 
Uruguay. However, as its pagerank scores are not as spread 
out as those of Uruguay, it seems to be depending more on 
the efforts of a few players to pass the ball around. Lahm 
and Schweinsteiger, in particular, are playing a central role 
in their team, not dissimilar to Xavi's. 

6 Further work 

The passing networks that we have presented provide an at- 
tractive visual summary or 'snapshot' of a football team's 
style. The obvious limitation of these networks is of course 
that they are static. But, as we have seen, they can be 
complemented with the computation of centrality measures 
that provide useful information about the importance and 
connectedness of individual players, which might benefit 
coaches, sports journalists and their readers. 

There are many additional features that could be added 
to the networks to obtain a more detailed analysis. An im- 
mediate one would be to add an extra node representing the 
opponent's goal and consider shots instead of passes for ar- 
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Player 


Ci 


C B (i) 




w 


Muslera 


0.88 


1.98 


4.62 


9.96 


Godin 


1.80 


4.20 


8.37 


11.99 


Gargano 


0.76 


0.37 


2.98 


7.33 


Victorino 


1.75 


0.88 


5.59 


14.28 


Cavani 


1.61 


10.22 


10.22 


13.68 


Forlan 


2.08 


10.29 


13.12 


15.02 


A. Pereira 


1.90 


3.75 


9.01 


16.45 


Perez 


2.36 


10.63 


15.25 


19.12 


M. Pereira 


2.28 


1.51 


12.21 


20.07 


Arevalo 


2.45 


5.85 


13.12 


19.83 


Caceres 


1.34 


3.65 


5.52 


9.51 



Table 5: Player scores for Uruguay. 



rows directed at the goal. This concept, with one node for 
shots in target and one for wide shots, has been previously 
used in (4j. 

Another interesting aspect to consider would be to study 
the accuracy of passes by adding to each player a weight 
taking into account the probability for a pass coming from 
that player to be successful. There are different levels of 
complexity that one might want to get into here, as not all 
passes are equally likely to succeed or fail. But, as a first 
approximation, one might just want to use the percentage 
of completed passes as a measurement of accuracy. 

Finally, let us mention that the defensive strength of a 
team could also be incorporated in the model by tracking 
passing interceptions and recovered balls. 
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